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Abstract 
The multi-domain non-structural protein 3 (Nsp3) is the largest protein 
encoded by the coronavirus (CoV) genome, with an average molecular mass 
of about 200 kD. Nsp3 is an_ essential component of _ the 
replication/transcription complex. It comprises various domains, the 
organization of which differs between CoV genera, due to duplication or 
absence of some domains. However, eight domains of Nsp3 exist in all known 
CoVs: the ubiquitin-like domain 1 (Ubl1), the Glu-rich acidic domain (also 
called “hypervariable region”), a macrodomain (also named “X domain”), the 
ubiquitin-like domain 2 (UbI2), the papain-like protease 2 (PL2°"°), the Nsp3 
ectodomain (3Ecto, also called “zinc finger domain”), as well as the domains 
Y1 and CoV-Y of unknown functions. In addition, the two transmembrane 


regions, TM1 and TM2, exist in all CoVs. The three-dimensional structures of 


| 


domains in the N-terminal two thirds of Nsp3 have been investigated by X-ray 
crystallography and/or nuclear magnetic resonance (NMR) spectroscopy since 
the outbreaks of Severe Acute Respiratory Syndrome coronavirus (SARS-CoV) 
In 2003 as well as Middle-East Respiratory Syndrome coronavirus 
(MERS-CoV) in 2012. In this review, the structures and functions of these 
domains of Nsp3 are discussed in depth. This article is part of the series “From 
SARS to MERS: Research on highly pathogenic human coronaviruses” 


(Hilgenfeld & Peiris, Antiviral Res. 100, 286-295 (2013). 


Keywords: ubiquitin-like domain; papain-like protease; macrodomain; 
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Abbreviations: GST, glutathione S-transferase; IRF, interferon regulatory 
factor; NF-KB, nuclear factor kKappa-light-chain-enhancer of activated B cells; 
TABS, TGF-beta-activated kinase 1 and MAPS3K/7-binding protein 3; Bbcs, 
Bcl-2-binding component 3; TRAF, TNF receptor-associated factor; RIG-l, 
retinoic acid-inducible gene |; STING, stimulator of interferon genes; TBK1, 
TANK-binding kinase 1; MDM2, mouse double minute 2 homolog; RCHY1, 
RING finger and CHY zinc finger domain-containing protein 1; PAIP1: 
poly(A)-binding protein-interacting protein 1; MKRN: makorin ring finger 


protein. 


1, Introduction 

This review of published research on the coronavirus non-structural protein 3 
(Nsp3) forms part of a series in Antiviral Research on “From SARS to MERS: 
research on highly pathogenic human coronaviruses.” (Hilgenfeld & Peiris, 
2013). Two excellent earlier papers dealt with aspects of Nsp3. ‘The first 
described the state of knowledge of the papain-like protease (PL””) 
(Baez-Santos et al., 2015), while the second adopted a bioinformatics 
viewpoint when describing Nsp3 and other non-structural proteins involved in 
anchoring the coronavirus replication/transcription complex (RTC) to modified 
membranous structures originating from the endoplasmic reticulum (ER) 
(Neuman, 2016). We build on these fine reviews, focusing on recent results 
and discussing the structures and functions of the individual Nsp3 domains in 


sequential order. 


Coronavirus (CoV) is a member of the subfamily Coronavirinae within the 
family Coronaviridae of the order WNidovirales. It is the enveloped 
positive-sense single-stranded RNA (+ssRNA) virus with the largest genome 
of all known RNA viruses thus far (Brian & Baric, 2005; Gorbalenya et al., 
2006). The genomes of different CoVs comprise between 26 and 32 kilobases; 
however, the overall organization of the genomes is similar. The 5’-terminal 


two thirds of the genome include two open reading frames (ORFs), 1a and 1b, 
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that together encode all non-structural proteins for the formation of the RTC, 
whereas the 3'-proximal third encodes the structural and accessory proteins 
(Fig. 1A; Brian & Baric, 2005). ORF1a encodes polyprotein (pp) 1a containing 
Nspi-11, while ORFia and ORF1b together produce ppilab containing 
Nsp1-16 through a (-1) ribosomal frameshift overreading the stop codon of 
ORF 1a (Fig. 1A; Brierley et al., 1989). Coronaviruses are divided into four 
genara: Alohacoronavirus, Betacoronavirus, _Gammacoronavirus, and 
Deltacoronavirus (Adams & Carstens, 2012). CoVs can infect many species 
(Fehr & Perlman, 2015); currently, the coronaviruses infecting humans are all 
from the genera aloha-CoV or beta-CoV. HCoV 229E and HCoV NL63 belong 
to the former (Tyrrell & Bynoe, 1965; van der Hoek et al., 2004), whereas 
HCoV OC43, HKU1, SARS-CoV, and MERS-CoV belong to the latter genus 
(Hamre & Procknow, 1966; Woo et al., 2005; Drosten et al., 2003; Ksiazek et 
al., 2003; Kuiken et al., 2003; Peiris et al., 2003; Zaki et al., 2012). 
Furthermore, HCoV OC43 and HKU1 belong to clade A of beta-CoV, while the 
two highly pathogenic human CoVs, SARS-CoV and MERS-CoV, are from 


clades B and C, respectively. 


Nsp3 is the largest multi-domain protein produced by coronaviruses (Fig. 1A). 
lt features a Somewhat different domain organization in different CoV genera. 


The individual coronaviruses can possess 10 to 16 domains of which eight 
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domains and two transmembrane regions are conserved, according to a 
recent bioinformatic analysis (Neuman, 2016). The domain organization of 
Nsp3 from HCoV NL63 as a representative of alpha-CoVs, and from 
SARS-CoV in clade B of the genus beta-CoV are displayed in Fig. 1A. Nsp3 is 
released from pptla/ltab by the papain-like protease domain(s), which is (are) 
part of Nsp itself (Fig. 1A; Ziebuhr et al., 2000). Nsp3 plays many roles in the 
viral life cycle (Fig. 1B). It can act as a scaffold protein to interact with itself and 
to bind other viral Nsps or host proteins (von Brunn et al., 2007; Pan et al., 
2008; Imbert et al., 2008; Pfefferle et al., 2011; Ma-Lauer et al., 2016). In 
particular, Nsp3 is essential for RTC formation (van Hemert et al., 2008; 
Angelini et al., 2013). The RIC is associated with modified host ER 
membranes that produce convoluted membranes (CMs) _— and 
double-membrane vesicles (DMVs) in SARS-CoV-, MHV (mouse hepatitis 
virus)- as well as MERS-CoV-infected cells (Snijder et al., 2006; Knoops et al., 
2008; Hagemeijer et al., 2011; de Wilde et al., 2013). Nsp3 and Nsp5 were 
detected on the CMs in SARS-CoV-infected cells by immunogold electron 
microscopy (Knoops et al., 2008). Co-expression of Nsp3, Nsp4, and Nsp6 
can induce DMV formation in SARS-CoV-infected cells but the same result 
was not observed when Nsp3 lacking its C-terminal third (residues 1319-1922) 
was co-expressed with Nsp4 and Nsp6 (Angelini et al., 2013). Correspondingly, 


co-expression of only the C-terminal third of Nsp3 (residues 1256-1922) and 


a 


Nsp4 induces the occurrence of the zippered ER and membrane curvature in 
SARS-CoV- or MHV-infected cells, which is likely to enhance DMV formation 
(Hagemeijer et al., 2014). Above all, Nsp3 is a key component for coronavirus 
replication; however, many functions of Nsp3 remain to be investigated. In this 
review, the current knowledge on the structures and functions of the individual 


Nsp3 domains is summarized and discussed. 


2. Ubiquitin-like domain 1 and the Glu-rich acidic region 

The ubiquitin-like domain 1 (Ubl1) and the Glu-rich acidic region are located at 
the N-terminus of Nsp3. These two regions together are also named “Nsp3a” 
(Neuman et al., 2008). Nsp3a exists in all CoVs in spite of no more than 15% 
amino-acid sequence identity between the domains in CoVs from different 


genera. 


Two Ubl1 structures from betacoronaviruses of different clades have been 
determined by NMR spectroscopy so far (Table 1); one is from SARS-CoV in 
clade B (Serrano et al., 2007) and the other from MHV in clade A (Keane and 
Giedroc, 2013). In SARS-CoV, the Ubli comprises residues 1-112; the core 
residues 20-108 form a typical ubiquitin-like fold with secondary-structure 
elements in the following order: B1—a1—B2—a2—n1—a3—B3-—B4 (nN: 310 helix; Fig. 
2A; Serrano et al., 2007); residues outside this core are flexible. The 
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well-defined structure of MHV  Ubl1 (residues 19-114) with the 
secondary-structure elements 8B1—a1—62—a2—a3-—83-64 is similar to that of 
SARS-CoV Ubl1 (Fig. 2B), with a root-mean-square deviation (R.M.S.D.) of 
2.8 A (for 85 out of 95 Ca atoms; Z-score: 7.4) according to the Dali server 
(Holm & Rosenstrom, 2010). A structural difference between the two Ubl1 
domains is that the two disjoined helices ni-a3 in SARS-CoV Ubl1 are 


replaced by one long continuous helix (a3) in MHV Ubl1 (Fig. 2A and B). 


The known functional roles of Ubl1 in CoVs are related to ssRNA binding and 
interacting with the nucleocapsid (N) protein (Fig. 1B; Serrano et al., 2007; 
Hurst et al., 2010, 2013). The Ubl1 of SARS-CoV binds single-stranded RNA 
(SSRNA) containing AUA patterns. Surprisingly, many negatively charged 
regions (such as the 349 helix, n1) show obvious conformational changes In the 
NMR spectra when RNA is added to the protein solution (Serrano et al., 2007), 
indicating that RNA binding has long-range effects on the protein conformation. 
In view of the presence of several AUA repeats in the 5'-untranslated region 


(UTR) of the SARS-CoV genome, the Ubl1 likely binds to this region. 


In MHV, the UbI1 domain efficiently binds the cognate nucleocapsid (N) protein; 
thus it seems to be important for virus replication as well as initiation of viral 


infection. There is a critical relationship between Nsp3 interaction with the N 
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protein and infectivity, as this interaction serves to tether the viral genome to 
the newly translated RTC at an early stage of coronavirus infection (Hurst et al., 
2010, 2013). Deletion of the Ubl1 core (residues 19-111) of MHV abrogates 
viral replication (Hurst et al., 2013). The major interface regions of the complex 
Ubli-N involve acidic residues of Ubl1 helix a2 and the serine- and 
arginine-rich region (SR-rich region) of the N protein, as shown by NMR 
titration experiments (Keane & Giedroc, 2013). However, the acidic residues in 
helix a2 are not absolutely conserved among different CoVs, implying that the 
details of the interactions between Ubl1 and N protein will not be the same. In 
addition, the binding affinity between the bovine coronavirus (BCoV) N 
(residues 57-216) and MHV Ubl1 is about 260-fold lower compared to MHV N 
(residues 60-219) and its cognate Ubl1 (Keane & Giedroc, 2013). A structure 
of the Ubl1—N complex would help understand why non-cognate Ubl1 and N 
protein bind weakly to each other. Thus far, only a computer docking model of 
the MHV Ubl1—N complex was reported (Tatar & Tok, 2016). This model 
proposes that residues of B81, a1, the loop between B1 and al, B3, and £4 of 
MHV Ubl1 interact with the N-terminal domain (NTD) as well as the SR-rich 
region of the N protein. Differently from what was suggested above, most 
acidic residues of Ubl1 helix a2 do not interact with the SR-rich region of N in 


the docking model (Tatar & Tok, 2016). 


The interaction between the N protein and nucleic acid is essential for CoV 
genome transcription (Chang et al., 2014). The NTD plus the SR-rich region 
(residues 60-219) of MHV N play an important role in interacting with 
transcriptional regulatory sequence (TRS) RNA (Grossoehme et al., 2009). 
The N-TRS RNA complex prevents the formation of the Ubl1—N complex 
(Keane & Giedroc, 2013). The competition between N protein binding to either 
the TRS or the Ubl1 might be connected to the switch between viral 
transcription and replication. It has been shown that the SR region of N protein 
can be phosphorylated (Peng et al., 2008). Each of two phosphomimetic 
substitutions of serine residues predicted to be phosphorylated (S207D and 
9218D) in the SR region of MHV N decreases the binding affinity to Ubl1 by 


about 3-fold, compared to wild-type N (Keane & Giedroc, 2013). 


The overall structure of the SARS-CoV Ubl1 domain is similar to human 
ubiquitin (Ub) and that of each of the two ubiquitin-like domains of human or 
mouse interferon-stimulated gene 15 (ISG15) (Fig. 2D and E; Vijay-Kumar et 
al., 1987; Narasimhan et al., 2005; Daczkowski et al., 2017). In human Ub as 
well as in the ISG15s, only a short 310 helix is found at the position of n1-a3 or 
a3 in Ubl1 of SARS-CoV or MHV (Fig. 2D and E). Ub and ISG15 are important 
for innate antiviral immunity (Heaton et al., 2016; Morales & Lenschow, 2013); 


therefore, viruses tend to not only inhibit the conjugation of Ub or ISG15 to 
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targets but also remove Ub or ISG15 from ubiquitinated or ISGylated proteins, 
respectively (Yuan & Krug, 2001; Bakshi et al., 2013; Yang et al., 2014). Thus, 
in CoVs, one or two papain-like protease (PL"°) domain(s) within Nsp3 
possess deubiquitinating (DUB) and delSGylating activities (see below; for a 
recent review on the role of viral proteases in counteracting the host-cell's 
innate immune system, see Lei & Hilgenfeld (2017)). Interestingly, two 
ubiquitin-like domains (Ubl1 and Ubl2) exist in all CoVs (see below; Neuman 
2016). Considering that ubiquitin-like modules are often involved in 
protein-protein interactions to regulate various biological processes 
(Hochstrasser, 2009), such as the MHV UblI1-—N interaction mentioned above, 
a novel possible function of Ub-like domains in CoVs might be the interaction 
with target proteins of Ub (or 1SG15) by mimicking the shape of these two 
molecules. The purpose of such mimicry could be to somehow interfere with 
pathways involving ubiquitinylated or ISGylated host targets, thereby leading 


to disruption of host anti-viral signal transduction or protein degradation. 


The Ubli of SARS-CoV is also similar to the Ras-interacting domain (RID) of 
RalGDS (Ral guanine nucleotide dissociation stimulator; Fig. 2F; Serrano et al., 
2007). Ras regulates cell-cycle progression via binding to the RID of 
Ras-interacting proteins (Hofer et al., 1994; Huang et al., 1998; Coleman et al., 


2004). By mimicking the RID, the Ubl1 might interrupt the interactions between 
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Ras and its effectors, thus affecting the cell cycle to support virus replication. In 
agreement with this, it is Known that both MHV and SARS-CoV induce 


cell-cycle arrest in the Go/G; phase (Chen & Makino, 2004; Yuan et al., 2005). 


Following the Ubl1, the second subdomain of Nsp3a in CoVs is the Glu-rich 
acidic region. It comprises residues 113-183 of SARS-CoV Nsp3, with more 
than 35% Glu and 10% Asp (Serrano et al., 2007). Because of the 
non-conserved amino-acid sequence, this region is also designated as 
“hypervariable region (HVR)” (Neuman, 2016). The HVR region is intrinsically 
disordered in SARS-CoV and in MHV (Serrano et al., 2007; Keane & Giedroc, 
2013) and does not affect the conformation of the globular Ubl1 domain in 
SARS-CoV (Serrano et al., 2007). Currently, the function of HVR in CoVs is 
unknown. Glu/Asp-rich proteins are often involved in many biological roles, 
such as DNA/RNA mimicry, metal-ion binding, and protein-protein interactions 
(Chou & Wang, 2015). The Ubl1+HVR region has been demonstrated via a 
yeast-two-hybrid (Y2H) assay to interact with SARS-CoV Nsp6, whereas a 
GST pull-down study identified Nsp8, Nsp9, and NAB-BSM-TM1 of Nsp3 (NAB: 
nucleic-acid binding domain; BSM: betacoronavirus-specific marker; TM1: 
transmembrane region 1; see below) as binding partners (Imbert et al., 2008). 
Does the HVR play any role in these protein-protein interactions? This 


question is yet to be answered. Furthermore, the acidic region is dispensable 
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for MHV replication (Hurst et al., 2013). On the other hand, this region does 
exist in all CoVs. It is conceivable that it may have regulatory rather than 
essential roles in the coronavirus replication process. However, the exact 


role(s) of the acidic region in CoVs should be further investigated. 


3. Papain-like protease 1 domain 

The papain-like protease domain(s) is/are responsible for releasing Nsp1, 
Nsp2, and Nsp3 from the N-terminal region of polyproteins 1a/1ab in CoVs 
(Harcourt et al., 2004; Barretto et al., 2005). The papain-like protease 1 
domain (PL1"°) follows the HVR region (see Fig. 1A) in the aloha-CoVs and in 
clade A of beta-CoVs (Graham & Denison, 2006; Ziebuhr et al., 2001; Chen et 
al., 2007; Wojdyla et al., 2010; Neuman, 2016). Interestingly, the PL1°° is not 
complete in the gamma-CoV infectious bronchitis virus (IBV; Ziebuhr et al., 
2001) and in Hipposideros pratti bat CoV, a virus relating to clade B of the 
beta-CoVs (Genebank code NC 025217.1; Neuman, 2016). In these latter 
viruses, some parts (such as the zinc-finger motif; see below) and the residues 
of the catalytic triad of the PL1°'°s are missing. Furthermore, the PL1°° is 
totally absent in beta-CoV clades B, C, and D as well as in delta-CoVs. Both 
the two highly human-pathogenic SARS-CoV (Fig. 1A) and MERS-CoV thus 


do not have a PL1°"° domain; they only possess the other papain-like protease, 


the PL2°'° domain that is conserved in all coronaviruses (see below). It is still 


not clear why certain CoVs encode two PL?'’s. 


Thus far, only one structure of a PL1°°° domain has been determined, that from 
the alopha-CoV Transmissible Gastroenteritis Virus (TGEV) (Table 1; Wojdyla 
et al., 2010). The PL1°° resembles an extended right-hand scaffold with thumb, 
palm, and fingers subdomains (Fig. 3). It contains a zinc-finger in the fingers 
Subdomain as well as a catalytic triad, Cys32—His183—Asp196. A canonical 
oxyanion hole as known from papain (Menard et al., 1991) is present in TGEV 
PL1°"°, with the main-chain amide of the catalytic cysteine residue and the 
side-chain of a glutamine residue (GIn27) 5 residues N-terminal to the cysteine 
contributing to the stabilization of the oxyanion transition state of peptide 
hydrolysis (Fig. 3; Wojdyla et al., 2010). The fold of the PL1°"° is similar to that 
of the PL2?'° of SARS-CoV (see below; R.M.S.D. 3.1 A, for 202 out of 211 Ca 
atoms; Dali Z-score = 18.4) and MERS-CoV (R.M.S.D. 3.1 A, for 198 out of 
211 Ca atoms; Z-score = 18.7) as well as to several human ubiquitin-specific 
proteases (USPs, such as USP 2, 7, 14, 21 etc., Z-scores from 11.2 to 12.6) 
(Ratia et al., 2006; Wojdyla et al., 2010; Lei et al., 2014). The biggest 
difference between these PL®’® domains is found in the zinc-finger regions, 
which are obviously flexible (Wojdyla et al., 2010; Lei et al., 2014). 


Furthermore, the electrostatic surface potential of TGEV PL1°° features two 
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negative patches which are absent in SARS-CoV PL2°"° (Wojdyla et al., 2010). 
One patch Is located at the opposite side of the active site, between the thumb 
and palm subdomains, and the other is near the active-site groove and the 
Surrounding region, between the thumb and fingers subdomains (Wojdyla et 
al., 2010). The latter patch is related to the substrate binding and specificity of 


TGEV PL1°° (Wojdyla et al., 2010). 


The PL1°° of TGEV has been demonstrated to process the cleavage site 
Nsp2/3 (J: cleavage site) and to exhibit DUB activity to remove ubiquitin from 
Lys48-/Lys63-linked Ub chains in vitro (Putics et al., 2006; Wojdyla et al., 
2010). The P4—P1 residues of the cleavage site between Nsp2 and 3 are 
Lys—Met—Gly—Gly in TGEV (Table 2), while the last four residues of ubiquitin 
are Leu—Arg—Gly—Gly. Therefore, the S4 pocket of TGEV PL1°° should be 
able to accommodate residues as different as Lys and Leu. In contrast, the 
P4—P1 residues in the polyprotein substrates of PL2°'° are Leu-Xaa—Gly—Gly 
(Xaa is Asn or Lys) in SARS-CoV (Harcourt et al., 2004; Barretto et al., 2005). 
P4 is the same residue as in the ubiquitin substrate; thus, the corresponding 
pocket of SARS-CoV PL2° is tailor-made for leucine. Residues lle155, 
Tyr175, and Thr209 form the S4 subsite in TGEV PL1°" (Fig. 3; Wojdyla et al., 
2010), whereas the corresponding residues in SARS-CoV PL2°" are Pro249, 


Tyr265, and Thr302 (Ratia et al., 2006). When superimposing the structures, 
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Wojdyla et al. (2010) found that the Ca atom of lle155 of TGEV PL1°° is 3 A 
away from the Ca atom of the corresponding Pro249 in SARS-CoV PL2?", 
thereby creating a larger S4 pocket in TGEV PL1°", so that it can bind lysine, 


in addition to leucine 


As mentioned above, for reasons unknown so far, many CoVs contain two 
PL?'°s. Both PL1°° and PL2°" are involved in releasing Nsp1, Nsp2, and Nsp3 
in these CoVs. However, the two PL’’s in different CoVs show varying 
substrate specificity. The PL1°° of MHV cleaves Nsp1J/2 and Nsp2J/3, while 
the PL2°'° cleaves Nsp3J4 (Table 2; Bonilla et al., 1997; Kanjanahaluethai & 
Baker, 2000). Human coronavirus NL63 (HCoV-NL63) PL1°° processes 
Nsp1J2 while the PL2°° processes the other two cleavage sites, Nsp2/3 and 
Nsp3J4 (Table 2; Chen et al., 2007). Both PL1°° and PL2°° of HCoV 229E 
can cleave Nsp1J2 and 2/3 (Table 2); however, the PL1° is more efficient in 
cleaving Nsp1J/2 while the PL2°° is more efficient with respect to the latter site 
(Ziebuhr et al., 2007). Some viruses, such as SARS-CoV, MERS-CoV, and 
IBV, comprise only one functional PL2°"° to process all three cleavage sites 
(Table 2). The residues (P5—P2’) of the three cleavage sites are diversified in 
MHV, HCoV NL63, and HCoV 229E, although the P1 is conserved as a small 
residue (Gly or Ala) (Table 2). In contrast, the P1 and P2 residues (Gly—Gly or 


Ala—Gly) are absolutely identical in all the cleavage sites of SARS-CoV, 
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MERS-CoV, and IBV; furthermore, the P4—P1 residues are — to a certain 
extent — conserved in each of these three viruses (Table 2). Therefore, the 
presence of two PL”'’s with slightly different substrate specificity in some CoVs 
may be required to cleave native substrates that deviate from the uniform ones 
processed by SARS-CoV, MERS-CoV, or IBV PL®'’s. Unfortunately, studies 
on the details of recognition of different substrates by PL1°° and PL2°° are 
hampered by the fact that no crystal structures of the two enzymes from the 


same virus are available. 


4. Macrodomains and the "Domain Preceding UbI2 and PL2° (DPUP)" 
(1) Macrodomain | (Mac1, X domain) 

A conserved macrodomain (also called “X domain”, Nsp3b) follows the HVR or 
the PL1°'° domain in all coronaviruses (Fig. 1A; Gorbalenya et al., 1991; 
Neuman et al., 2008; Neuman, 2016). Macrodomains widely exist in bacteria, 
archaea, and eukaryotes (Han et al., 2011). In addition, these conserved 
domains are also present in several positive-sense sSsRNA (+SsRNA) viruses 
of the families Hepeviridae, Togaviridae, and Coronaviridae, such as hepatitis 
E virus (HEV), alohavirus, rubivirus, and all coronaviruses (Koonin et al., 1992; 
snijder et al., 2003). Our group has shown that the X domain (Mac1) is 
dispensable for RNA replication in the context of a SARS-CoV replicon (Kusov 


et al., 2015). Recently, evidence accumulated showing that the X domain plays 
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a role in counteracting the host innate immune response (Eriksson et al., 2008; 


Kuri et al., 2011; Fehr et al., 2015, 2016). 


The first crystal structure of an Nsp3 domain of any coronavirus was the 
unliganded X domain of SARS-CoV (Table 1; Saikatendu et al., 2005). A little 
later, the structure of the SARS-CoV X domain in complex with ADP-ribose 
(ADPr) was determined (Table 1; Egloff et al., 2006). Subsequently, structures 
of the unliganded X domain and/or its complex with ADPr from HCoV 229E, 
IBV, HCoV NL63, Feline CoV (FCoV), and MERS-CoV were reported (Table 1; 
Piotrowski et al., 2009; Xu et al., 2009; Wojdyla et al., 2009; Cho et al., 2016). 
All structures show that the X domain adopts a conserved three-layered a/B/a 
sandwich fold (Fig. 4). The domain with this fold is called a macrodomain 
because of its similarity to the extra domain in the MacroH2A variant of human 
histone 2A (Pehrson & Fried, 1992; Saikatendu et al., 2005). Typically, the X 
domain includes a central 6B sheet with seven £6 strands in the order 
B1—B2-87-B6—-B3-B5-B4, with B1 and B4 being antiparallel to the rest (Fig. 4). 
Only the X domain of IBV is an exception, since it lacks the first strand, B1 
(Piotrowski et al., 2009; Xu et al., 2009). Six helices are located on the two 
sides of this B sheet, with helices a1, a2, and a3 on one side and a4, a5, and 


a6 on the other (Fig. 4). 


One function of the conserved macrodomain is the binding of ADP-ribose or 
poly(ADP-ribose) (Han et al., 2011). The binding characteristics are the same 
in most X domains of coronaviruses (Egloff et al., 2006; Xu et al., 2009; 
Wojdyla et al., 2009; Cho et al., 2016). Like Cho et al. (2016), we have 
determined the crystal structure of the MERS-CoV X domain in complex with 
ADP-ribose (ADPr) (PDB entry: 5HOL; Fig. 4). Our structure and the 
ADPr-binding pattern are almost identical to the structure (PDB entry: 5DUS) 
described by Cho et al. (2016) and the structure of the SARS-CoV X domain in 
complex with ADPr (PDB entry: 2FAV; Egloff et al., 2006). The R.M.S.D. are 
0.4 A (for 165 out of 165 Ca atoms; Z-score: 34.2) and 1.2 A (for 163 out of 
171 Ca atoms; Z-score: 28.3), respectively, according to the Dali server (Holm 
& Rosenstrom, 2010). Here, we describe the structure of the MERS-CoV X 
domain in complex with ADPr from our own laboratory as an example (Fig. 4). 
The ADPr is located in a cleft at the top of the central B sheet (8 7—B6—B3-B5). 
Five stretches of amino-acid residues are mainly involved in the binding of 
ADPr: |, Gly20-Ala22; Il, Ala37—Asn39; Ill, Lys43—-Ala49 (including a 
“45-GGG-47" triple-glycine motif); IV, Pro124—Phe131; V, Val153—Asn155 (Fig. 
4). The adenine base is in contact with regions | and V. In particular, the 
side-chain of Asp21 accepts a hydrogen bond from the exocyclic NHz2 group in 
position 6 of the adenine, thereby fixing the orientation of the base. This Asp 


residue is conserved in macrodomains from bacteria, archea, and eukaryotes 
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(Saikatendu et al., 2005; Egloff et al., 2006). When the corresponding Asp20 in 
the macrodomain protein AF1521 of Archeoglobus fulgidus was replaced by 
alanine, the ADP-ribose binding affinity was reduced almost 90-fold (Karras et 
al., 2005). The central ribose moiety is located between regions IV and V. The 
O2' of ADPr forms a hydrogen bond with a water molecule (H2O 308) that is 
Stabilized by the side-chain of Asn155 (region V). The two phosphate groups 
accept a total of four hydrogen bonds from lle48 (region III) and Gly129, Ile130 
as well as Phe131 (region IV). The distal ribose is in contact with regions II and 
Ill; The O1" and O2" of this ribose form hydrogen bonds with the amides of 
Gly47 and Gly45 (region Ill), respectively. The O83" forms a hydrogen bond 
with the side-chain amide of Asn39 (region II). Thus, Aso21 and Asn39 appear 
to fix the two ends of the ADP-ribose, thereby stabilizing its binding to the cleft 
(Fig. 4). Surprisingly, the orientation of the corresponding Asp in the 
HCoV-229E X domain is different; this Asp does not directly bind ADP-ribose 
but is in contact with its neighboring residue Thr-22, and not with the N6 atom 
of adenine (Piotrowski et al., 2009; Xu et al., 2009). This difference could 
explain why the binding affinity between the X domain of HCoV 229E and 
ADPr is about 10-fold lower than that of the MERS-CoV homologue 
(Piotrowski et al., 2009; Cho et al., 2016). Interestingly, the X domain from IBV 
Strain M41 but not of IBV strain Beaudette can bind ADPr (Xu et al., 2009; 


Piotrowski et al., 2009). The important “Gly—Gly—Gly” motif of the M41 X 
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domain, involved in binding the distal ribose, is mutated to “Gly—Ser—Gly” in 
the Beaudette virus, thus preventing ADPr interaction with the X domain 
(Piotrowski et al., 2009). The virulence of IBV strain Beaudette is attenuated 
compared to that of IBV strain M41 (Geilhausen et al., 1973). It is an 
interesting hypothesis that the loss of the ability to bind ADPr may be one of 


the reasons for the lower pathogenicity of the former IBV. 


Macrodomains of some CoVs have been shown to exhibit a weak 
ADP-ribose-1"-phosphate phosphatase (ADRP) activity in vitro (kcat ~ 5 - 20 
min’; Saikatendu et al., 2005; Egloff et al., 2006; Putics et al., 2006). The 
residue Asn41 of SARS-CoV (corresponding to the Asn39 in MERS-CoV 
mentioned above) is essential for ADRP activity (Egloff et al., 2006). However, 
the ADRP activity is dispensable for HCoV-229E replication in cell culture 
(Putics et al., 2005). On the other hand, when the ADRP activity of the 
HCoV-229E or that of the SARS-CoV X domain is inactivated through 
replacement of the Asn mentioned above by Ala, mutant viruses exhibit 
increased interferon a (IFN-a) sensitivity (Kuri et al., 2011). Interestingly, the 
corresponding mutants in MHV (strains A59 and JHM) and a mouse-adapted 
SARS-CoV do not show an increased IFN-B sensitivity (Eriksson et al., 2008; 


Fehr et al., 2015, 2016). 
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Fehr et al. (2016) confirmed that the wild-type X domain of SARS-CoV inhibits 
the expression of innate-immunity genes (such as IFN-B, interleukin 6 (IL-6)) in 
vitro and thereby blocks the host immune response. At variance with this, 
Eriksson et al. (2008) and Fehr et al. (2015) reported that the Asn-to-Ala 
mutation in the MHV (strains A59 and JHM, resp.) X domain reduces the 
production of inflammatory cytokines (e.g., IL-6) in vitro and in vivo. Eriksson et 
al. (2008) hypothesized that the X domain aggravates MHV-induced severe 
liver pathology, likely by inducing the expression of inflammatory cytokines. 
These results suggest that the main function of the X domain may differ in 
different CoVs. On the other hand, the expression level of type-I IFN (a or B) is 
increased in cells infected with SARS-CoV or MHV carrying the Asn-to-Ala 
mutation in the X domain (Eriksson et al., 2008; Kuri et al., 2011; Fehr et al., 
2016). This indicates that suppression of innate immunity by the X domain may 


be a feature conserved across the coronaviruses. 


Recently, it was demonstrated that macrodomains from several +ssRNA 
viruses (such as HEV, SARS-CoV, HCoV 229E, Venezuelan equine 
encephalitis virus (VEEV), and Chikungunya virus (CHIKV)) act as hydrolases 
removing mono- and/or poly(ADP-ribose) from mono- or poly(ADP-ribosyl)ated 
proteins, activities designated as de-mono-ADP-ribosylation (de-MARylation) 


and de-poly-ADP-ribosylation (de-PARylation), respectively (Li et al., 2016a; 
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Fehr et al., 2016; Eckei et al., 2017; McPherson et al., 2017). The weak ADRP 
activity described for the X domain in the literature is most probably just a 


non-physiological side reaction of de-MARylation and/or de-PAR ylation. 


The ADP-ribosylation (MARylation or PARylation) of proteins is a reversible 
positranslational modification involved in various cellular processes (Aravind et 
al., 2015; Liu & Yu, 2015). Poly(ADP-ribose) polymerases (PARPs, also 
named ARTDs, ADP-ribosyltransferases diphtheria toxin-like) are responsible 
for transfering mono- or poly(ADP-ribose) to target proteins (Liu & Yu, 2015). 
For example, PARP7 (ARTD14), PARP10 (ARTD10), PARP12 (ARTD12), and 
PARP14 (ARTD8) add mono-ADPr to other proteins and themselves 
(BUtepage et al., 2015), while PARP1 (ARTD1) and PARP2 (ARTD2) add 
poly-(ADPr)s (Gibson & Kraus, 2012). Various amino-acid residues have been 
identified as acceptor sites for ADP-ribosylation; this still seems to be a matter 
of some debate. Arg and Ser have certainly been shown to accept ADPr(s) 
(Laing et al., 2011; Leidecker et al., 2016), but the acidic residues are also 
thought to be important sites of ADP-ribosylation (Feijs et al., 2013). PARP7, 
10, and 12 can act as type-| IFN-stimulated genes (ISGs) and inhibit VEEV 
replication (Atasheva et al., 2014). Also, Verheugd et al. (2013) reported that 
PARP10 can block the NF-kB pathway via MARylation of NEMO ("NF-kB 


essential modulator"). Moreover, the mRNA and protein synthesis of PARP14 
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(ARTD8) and PARP10 are stimulated by IFN-a in vivo (Eckei et al., 2017). 
Therefore, some PARPs play a role in the host immune defense. Recently, it 
has been demonstrated that the X domains of SARS-CoV and HCoV 229E 
possess the ability to de-MARylate the ADP-ribosylated PARP10 catalytic 
domain in vitro (Fehr et al., 2016; Li et al., 2016a). However, the relationship 
between the de-MARylation function of viral macrodomains and_ their 
anti-innate immunity activity is still unclear. The de-MARylation activity is a 
common feature of the X domain (i.e., the first of the macrodomains if there is 
more than one) of all investigated macrodomain-encoding viruses (Li et al., 
2016a; Eckei et al., 2017). Interestingly, the macrodomains of VEEV and 
SARS-CoV can also remove the entire PAR chain from PARylated PARP5a, 
PARP1, and PARP3 (ARTD93), without releasing free monomeric ADPr (Li et 
al., 2016a). Therefore, the macrodomains of these two viruses hydrolyze the 
amino acid—ADPr ester bond but not ribose-ribosyl glycosidic bonds in PAR 
chains. A similar observation was also made for the macrodomain of CHIKV, 
although the de-PARylation of PARylated PARP1 was weak (Eckei et al., 
2017). Currently it is unknown whether the de-PARylation activity of 


macrodomains plays any role in the coronavirus life cycle. 


The conserved Asn42 residue, the triple-glycine 48-GGG-50 motif, and Gly123 


of the HEV macrodomain (corresponding to Asn39, 45-GGG-47, and Gly129 
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of the MERS-CoV X domain mentioned above) are essential for the 
de-MARylation activity (Li et al., 2016a). This is not surprising because they 
are involved in binding ADP-ribose. A _ putative mechanism for the 
de-MAR ylation activity of the VEEV macrodomain has been proposed (Li et al., 
2016a). It is assumed that a water molecule performs a nucleophilic attack 
onto the C1" atom of the mono(ADP-ribose). An equivalent water molecule 
(H20 310) also exists in our structure of the MERS-CoV X domain—ADPr 


complex (Fig. 4). 


Interestingly, the neighboring helicase domain of HEV can increase the 
de-PARylation activity of the macrodomain by about 11-fold but not the 
de-MARylation activity, perhaps because the helicase can support binding of 
the PAR chain (Li et al., 2016a). This observation raises the question whether 
a similar phenomenon exists in CoVs? Should the neighboring domains 
indeed have an influence on the de-MARylation/de-PARylation activities of the 
CoV X domain, this effect should differ between the various viruses, as there Is 
little conservation of the neighboring regions. In addition, other CoV Nsps have 
ben demonstrated to interact with the X domain. Using a GST pull-down assay, 
the X domain of SARS-CoV has been shown to bind the RNA-dependent RNA 
polymerase, Nsp12 (Imbert et al., 2008). If this interaction does exist in the 


virus life cycle, is it possible that the two proteins affect the enzymatic activity 
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of each other? Although many three-dimensional structures of CoV 
macrodomains have been determined, more efforts should be made to study 


the biological functions of this domain. 


(I) Macrodomains II and Ill, and the DPUP (SUD-N, SUD-M, SUD-C) 

Within Nsp3, a non-conserved region follows the X domain (or Mac 1). When 
the first SARS-CoV genome sequences were analyzed, this region was 
recognized as a unique domain only existing in SARS-CoV and therefore 
called “SARS-unique domain” (SUD) (Snijder et al., 2003). An alternative 
name is “Nsp3c” (Neuman et al., 2008). The three-dimensional structure of this 
region has been determined by X-ray crystallography and NMR spectroscopy 
(Table 1; Tan et al., 2009; Chatterjee et al., 2009; Johnson et al., 2010). This 
region includes three distinct subdomains: two macrodomains and one 
frataxin-like fold (Fig. 5A-C). The three subdomains were named SUD-N, 
SUD-M, and SUD-C, indicating the N-terminal, the middle, and the C-terminal 
region of SUD, respectively. A region corresponding to parts of SUD was 
found to exist in other coronaviruses, mostly of clades B, C, and D of the genus 
Betacoronavirus (Neuman, 2016). For example, domains similar to SUD-M 
and SUD-C (but not SUD-N) are also encoded by the MERS-CoV genome 
(Kusov et al., 2015; Ma-Lauer et al., 2016). Thus, it is no longer appropriate to 


call this domain "SARS-unique". Recently, the Nsp3 of MHV was shown by 
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X-ray crystallography to contain a SUD-C-like fold (Chen et al., 2015). These 
authors renamed this region into "Domain Preceding Ubl2 and PL2°'°" (DPUP). 
In this review, we follow the nomenclature proposed by Chen et al. (2015) and 
Neuman (2016), and use the designations macrodomain Il (Maca), 
macrodomain III (Mac3), and Domain Preceding Ubl2 and PL2”° (DPUP) for 


SUD-N, SUD-M, and SUD-C, respectively. 


Mac2 (SUD-N) has been shown to be dispensable for the SARS-CoV 
replication/transcription complex within the context of a SARS-CoV replicon, 
but surprisingly, Mac3 (SUD-M) is essential, even though it is not conserved 
throughout the coronaviruses (Kusov et al., 2015). Mac2 and Mac3 each 
display a typical a/B/a macrodomain fold (Fig. 5A and B). The central 8B sheet 
with six B strands in the order B1—B6—B5—B2-B4—B3 is flanked by two (or three) 
helices on either side. Only the last strand, B3, is antiparallel to the other 
Strands. Interestingly, Mac2 and Mac3 have the same number of B strands in 
the central 8 sheet as the X domain of IBV (see above for X domain of IBV). 
The R.M.S.D. values are 2.5 A - 2.6 A (for 119/171 Ca atoms) between Mac2, 
Mac3, and the X domain of SARS-CoV, according to the Dali server (Holm & 
Rosenstrém, 2010). The corresponding values are 2.6 A - 2.7 A (for 120/165 
Ca atoms) when comparing SARS-CoV Mac2 and Mac3 with the X domain of 


IBV. Although the X-domain and Mac2/3 share the same fold, the sequence 
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identity among them is only about 11% (Tan et al., 2009). All the residues 
important for binding ADP-ribose and for de-MARylation/de-PAR ylation activity 
(such as the Asn residue and the “GGQ’ triple-glycine motif interacting with the 
distal ribose, as mentioned above) are not conserved in Mac2/3; therefore 


Mac2/3 cannot bind ADP-ribose (Tan et al., 2009; Chatterjee et al., 2009). 


Currently, most known functions of Mac2/3 are connected with RNA binding. 
Mac2-3 (SUD-NM) preferentially binds  oligo(G), which can_ form 
G-quadruplexes; as expected for these structural modules, the binding affinity 
is enhanced by potassium ions (Tan et al., 2007, 2009). According to a 
mutational study, two positively charged lysine-patches of Mac2 are involved 
in oligo(G) binding, i.e. Lys476+Lys477 (in the loop between a3 and £5; 
residue numbering starts at N-terminus of Nsp3) and Lys505+Lys506 (at the 
end of a4), while the residues Lys563+Lys565+Lys568 (+Glu571) of Mac3 
(located between a2 and £3) are absolutely essential for binding (Fig. 5B; Tan 
et al., 2009). Moreover, working with the SARS-CoV replicon, our laboratory 
has shown that mutation of the same lysine patch of Mac3 in the context of the 
replicon completely abolished SARS-CoV replication, indicating that binding of 
G-quadruplex RNA could be an essential element of RTC activity (Kusov et al., 
2015). Also, Mac3 can bind (GGGA)2 and (GGGA)s as well as (GGGA)2GG 


(Johnson et al., 2010). In contrast, Mac3—DPUP (SUD-MC; DPUP: SUD-C, 
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see below) only binds (GGGA)2GG but not (GGGA)zs or (GGGA)s. A 3'-terminal 
G nucleotide is apparently important for binding to Mac3—DPUP (Johnson et 
al., 2010). These data indicate that the DPUP subdomain may fine-tune the 


specificity of RNA binding by Mac3 (Johnson et al., 2010). 


The SARS-CoV genome contains three Ge-stretches and two Gs-stretches 
(Tan et al., 2009; Johnson et al., 2010), but none of them is conserved in all 
SARS-CoV strains. However, two GGGAGGGUAGG nucleotide segmenis, 
located in the Nsp2 and Nsp12 coding sequences, are highly conserved in 
various SARS-CoV strains (Johnson et al., 2010). These two nucleotide 
segments differ by only one base from the sequence favored by Mac3—DPUP, 
(GGGA)2GG. Johnson et al. (2010) therefore proposed that these two 
sequences could be potential physiological substrates of Mac3—DPUP. 
Besides specific elements in the genome of SARS-CoV, Mac2-3 might bind 
G-rich stretches in host MRNAs. In fact, Mac2-3 prefers to bind longer 
G-stretches, such as (G)i9 to (G)14 (Tan et al., 2007). Such long G-stretches 
exist in several 3' non-translated regions of host mRNAs, such as the NF-KB 
signaling pathway-related protein TAB3 mRNA and apoptotic signaling 
pathway protein Boc3 mRNA (Tan et al., 2007, 2009). Mac2-3 may regulate 


the expression of these genes by binding to the poly(G) stretches in the 
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corresponding MRNAs, thereby leading to disruption of the host antiviral 


response as well as of apoptotic signals. 


Mac3 has also been reported to bind oligo(A) (Chatterjee et al., 2009; Johnson 
et al., 2010). This observation (which is not in agreement with the results 
reported by Tan et al. (2007, 2009)) might suggest that Mac3 binds the poly(A) 
tail of the viral genome, or of subgenomic mRNAs, or of host MRNA. 
Poly(A)-binding protein (PABP) binds the genomic poly(A) tails of BCoV 
(bovine coronavirus), MHV, and TGEV, thereby enhancing the replication of 
these viruses (Spagnolo & Hogue, 2000; Galan et al., 2009). Is it possible that 
Macs binding to oligo(A) competes with the binding between PABP and the 


poly(A) tail? The question is yet to be answered. 


Besides binding to nucleic acids, Mac2-3 of SARS-CoV has been shown to 
interact directly with host proteins, e.g. the E3 ubiquitin ligase RCHY1 
(Ma-Lauer et al., 2016). RCHY1 and several other host proteins, Paip1, 
MKRN2, and MKRNS etc. were reported to interact with Nsp3 (Pfefferle et al., 
2011). However, the detailed binding region(s) on Nsp3 have not been 
identified. Ma-Lauer et al. (2016) demonstrated that Mac2-3 and the PL2°” of 
Nsp3 bind RCHY1, thus resulting in down-regulation of the antiviral protein p53 


(see below; Ma-Lauer et al., 2016). It is an interesting hypothesis that such 
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interactions, which are absent from other CoVs because they lack Mac2-3, 
might account for a unique pathogenicity-related pathway utilized by 


SARS-CoV. 


The DPUP (SUD-C) follows the Mac3 domain in SARS-CoV (Fig. 1A). Deletion 
of the domain within the context of a SARS-CoV replicon leads to a large 
reduction of RNA synthesis, but some basal RTC activity remains, indicating 
that the DPUP is not absolutely essential for replication (Kusov et al., 2015). 
Currently, three DPUP structures are available, one each from SARS-CoV and 
MHV (Table 1; Fig. 5C and D; Johnson et al., 2010; Chen et al., 2015), and the 
third one from bat coronavirus HKU9 (Table 1; Hammond et al., 2017). All 
DPUPs adopt a similar topology and overall structure. The R.M.S.D values 
between SARS-CoV DPUP and that of MHV or HKU9 are 2.1 A (for 62 out of 
74 Ca atoms; Z-score: 7.1) or 2.0 A (for 62 out of 77 Ca atoms; Z-score: 7.0), 
respectively, according to the Dali server (Holm & Rosenstrom, 2010). The 
DPUP consists of an anti-parallel 8 sheet with two a helices located N- and C- 
terminal to this B sheet (Johnson et al., 2010; Chen et al., 2015). The two a 
helices form one plane while the B sheet forms the other; this resembles a 
typical frataxin-like fold (Bencze et al., 2006). Proteins featuring the 
frataxin-like fold are commonly involved in controling cellular oxidative stress 


by binding iron to maintain the iron homeostasis (Bencze et al., 2006). In case 
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of the yeast frataxin homologue Yfhi, cells lacking this gene were 
demonstrated to be highly sensitive to H2O2 and elevated metal ion levels 
(such as iron and copper) (Foury & Cazzalini, 1997). Several Glu and Asp 
residues in the N-terminal a helix of Yfh1 are possibly involved in binding metal 
ions (Fig. 5E; He et al., 2004; Bencze et al., 2006). Interestingly, “EEXXXE” 
and “DDD” motifs exist in the first helix of the SARS-CoV and MHV DPUP, 
respectively, even though the sequence identity of DPUP is only 138% between 
these two viruses. Neuman et al. (2008) found that SARS-CoV 
Mac2—Mac3—DPUP can bind cobalt ions, while Mac3 alone and Mac2*—Mac3 
(2*: C-terminal half of Mac2) cannot (Neuman et al., 2008). According to these 
observations, it is conceivable that the DPUP region binds metal tons. 
Furthermore, infection with SARS-CoV can induce transcription of oxygen 
stress-related genes of the host (Hu et al., 2012). Any involvement of DPUP in 


this biological process is speculative at this time. 


The Mac2-3—DPUP oligodomain (SUD) has been shown to interact with Nsp9, 
Nsp12, and NAB-BSM-TM1 (see below) of Nsp3 by using a GST pull-down 
assay (Imbert et al., 2008). Using Y2H and co-immunoprecipitation (CoIP) 
assays, the oligoprotein Ubl1—HVR—Mac1-2-3* (38*, N-terminal third of Macs) 
of SARS-CoV Nsp3 has been found to bind Nsp2, ORF8a, and ORF9b (von 


Brunn et al., 2007); However, with the slightly larger region 
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Ubl1-HVR—Mac1-2-3—DPUP, these interactions were not confirmed in an Y2H 
assay (Pan et al., 2008). It seems that DPUP might modulate the various 
binding processes. Furthermore, the DPUP subdomain could also regulate the 
sequence specificity of RNA binding by Mac3 as mentioned above (Johnson et 


al., 2010). 


The relative orientation of SARS-CoV Mac2 and Macs is fixed by an artificial 
disulfide bond and dimer formation in the crystal (Tan et al., 2009). The NMR 
structure shows that Mac2 and Macs as well as Mac3 and DPUP have no 
preferred relative orientations to one another (Johnson et al., 2010). However, 
Mac2, Mac3, and DPUP are surrounded by other domains within Nsp3; it is 
unclear whether these other domains affect the relative orientation among the 
three. More multi-domain structures will be needed to answer this question and 
to elucidate the structural basis of mutual influences of these modules onto 
each other (see, e.g., above for the influence of the HEV helicase on the 


macrodomain of this virus). 


5. Ubiquitin-like domain 2 and papain-like protease 2 
Besides the Mac (X domain), the largest number of crystal structures for any 
Nsp3 domain have been determined for the ubiquitin-like domain 2 (UbI2) plus 


the papain-like protease 2 (PL2"°). So far, structures of this region are 
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available for SARS-CoV, MERS-CoV, IBV, and MHV (Table 1; Ratia et al., 
2006; Lei et al., 2014; Kong et al., 2015; Chen et al., 2015). Ubl2 and PL2?° 
are conserved in all CoVs (Neuman et al., 2008; Neuman, 2016). The exact 
functional role of the Ubl2 domain is not clear so far, while the PL2°° was 
reported to possess proteolytic, deubiquitinating, and delSGylating activities 
(Barretto et al., 2005; Lindner et al., 2005; Yang et al., 2014; Mielech et al., 


2014). 


(1) Ubiqutin-like domain 2 (UbI2) 

The Ubl2 is the second ubiquitin-like subdomain located within Nsp3 (Fig. 2C 
and 6). The structures of Ubl2 in different CoVs are more conserved compared 
to the Ubl1. For example, the R.M.S.D. between the Ubl2s of SARS-CoV and 
MHV is 1.2 A (for 58 out of 68 Ca atoms; Z-score: 11.1) according to the Dali 
server (Holm & Rosenstrom, 2010), whereas the corresponding value for the 
Ubl1s of the two viruses is 2.8 A (for 85 out of 93 Ca atoms; Z-score: 7.5). 
Some host USPs (with a fold similar to the CoV PL”) also include one or more 
Ub-like domain(s), which is/are used to regulate the catalytic activity as well as 
to interact with partners (Komander et al., 2009; Faesen et al., 2012; Pfoh et 
al., 2015). For example, the N-terminal Ubl domain of USP14 Is critical for its 
recruitment at the proteasome, thereby enhancing its catalytic activity (Hu et 


al. 2005; Faesen et al, 2012). USP7 (also named “HAUSP”: 
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Herpesvirus-associated USP) includes five Ub] domains (Ubl 1-5), which are 
located at the C-terminus of the protease domain. Ubl4-5 promote Ub binding 
and enhance the DUB activity of USP7 by about 100-fold via interacting with 
the “switching loop” (Trp285-Phe291) in the USP7 catalytic domain (Faesen et 
al., 2012). Ubl2 of USP7 interacts with the HSV-1 immediate-early protein 
ICPO to antagonize the host antiviral response (Pfoh et al., 2015). In contrast 
to the variable relative orientations of the Ub!l domains and the catalytic domain 
of USP7, the Ubl2 domain is anchored to the CoV PL2"" by two salt-bridges in 
MERS-CoV and SARS-CoV (Lei et al., 2014), so it is unlikely to regulate the 
catalytic activity of PL2°'°. In agreement with this conclusion, the presence or 
absence of the Ubl2 of SARS-CoV or MERS-CoV shows almost no effect on 


the PL2°" activities (Frieman et al; 2009; Clasman et al., 2017). 


Currently, several inconsistent roles of Ubl2 are reported. Frieman et al. 
(2009) demonstrated that the Ubl2 of SARS-CoV is necessary to antagonize 
the host innate immune response via blocking IRF3 or the NF-KB pathway. In 
contrast, Clementz et al. (2010) reported that the Ubl2 of SARS-CoV is not 
necessary for antagonizing IFN production. Also, Mielech et al. (2015) showed 
that the Val787Ser mutation (Nsp3 numbering) in the MHV Ubl2 reduces the 
thermal stability of the PL2°°, whereas, Clasman et al. (2017) reported that the 


Ubl2 of MERS-CoV does not affect PL2°"° thermal stability. The former Val 
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residue of MHV is conserved in SARS-CoV and MERS-CoV. It is located in the 
first strand (81) and contributes to the hydrophobic core of Ubl2; therefore, the 
Val-to-Ser change might disrupt the global Ubl2 structure, leading to a 


decrease in the stability of the PL2°'° domain (Mielech et al., 2015). 


On the basis of molecular dynamics simulations, the MERS-CoV Ubl2 has 
recently been proposed to display more molecular flexibility when the PL2°"° 
binds ubiquitin, compared to the situation in the free enzyme. The authors 
speculate that the difference in flexibility of the Ubl2 might regulate the 
interaction with downstream targets, thereby modulating the innate immune 
response (Alfuwaires et al., 2017). Ubiquitination and deubiquitination cannot 
only regulate the immune response but also the cell-cycle, DNA damage repair, 
cellular growth etc. (Welchman et al., 2005), and these processes will involve a 
large number of host proteins. Among these, the coronavirus PL2"° should 
select its specific targets, such as the host innate-immune system-related 
proteins TRAF3, STING, TBK1, IRF3 etc. (Chen et al., 2014; Lei & Hilgenfeld, 
2017), with the goal of facilitating efficient virus survival. We therefore 
speculate that the Ubl2 might act as a modulator helping the PL2° recognize 
its specific targets during coronavirus infection. However, this idea needs to be 


verified by future research. 
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(Il) papain-like protease 2 (PL2?"°) 

The PL2° adopts an extended right-hand fold with thumb, palm, and fingers 
subdomains, similar to the TGEV PL1°° (Fig. 6; Ratia et al., 2006; Lei et al., 
2014; Lee et al., 2015; Kong et al., 2015; Chen et al., 2015; Clasman et al., 
2017) and human USPs (e.g. USP14, USP7; Ratia et al., 2006). A zinc ion is 
coordinated by four cysteines from two B hairpins in the fingers subdomain and 
forms a zinc-finger motif. Although the conformations of the zinc finger are 
variable between different PL2°'°s (Lei et al., 2014; Lee et al., 2015; Kong et al., 
2015; Chen et al., 2015), the motif is essential for structural stability and 
proteolytic activity (Barretto et al., 2005). The catalytic site of PL2°° comprises 
the typical Cys-His-Asp triad, just like the PL1°° of TGEV (see above). The 
catalytic Cys is located in the thumb subdomain (at the N terminus of helix 4 of 
SARS-CoV and MERS-CoV PL2°°: Ratia et al., 2006; Lei et al., 2014), 
whereas the His as well as the Asp are located in the palm subdomain. In the 
free PL2, the catalytic triad Cys-His-Asp is pre-formed, different from USP7, 
where the catalytic residues are only well aligned upon Ub binding to the 
enzyme (Hu et al., 2002). AS we mentioned above, the oxyanion hole of 
papain-like proteases normally comprises a Gln or Asn side-chain 5 or 6 
residues N-terminal to the catalytic Cys. This situation is found in the MHV 
PL2° (Chen et al., 2015), but the corresponding residues are Trp, Leu, and 


Trp in the enzymes of SARS-CoV, MERS-CoV, and IBV, respectively (Ratia et 
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al., 2006; Lei et al., 2014; Kong et al., 2015). Nevertheless, the indole-ring 
nitrogen of Trp can form a hydrogen bond with the oxyanion intermediate of 
substrate hydrolysis. The protease activity of the SARS-CoV PL2"” is 
abolished upon a Trp-to-Ala mutation (Ratia et al., 2006). In contrast, the Leu 
of MERS-CoV PL2°"° totally lacks the ability to contribute to oxyanion 
Stabilization via a hydrogen bond (Lei et al., 2014). The deficient oxyanion hole 
of MERS-CoV PL2?'° causes an about 100-fold lower proteolytic activity 
compared to that of the SARS-CoV PL2°° when — using 
Arg—Leu—Arg—Gly—Gly-7-amino-4-methylcoumarin. (RLRGG-AMC) as a 
substrate (Baez-Santos et al., 2014). Meanwhile, the corresponding activity of 
the Leu-to-Trp mutation in MERS-CoV PL2° is about 50-fold higher than that 
of the wild-type enzyme, using the same substrate (Lei et al., 2014). As we 
mentioned before (Lei & Hilgenfeld, 2016), the efficiency of viral proteases 
does not always have to be optimized during virus evolution. Rather, the 
creation of temporary intermediates of polyprotein cleavage, in the right 
temporal order, is necessary for correct virus replication (Kanjanahaluethal & 
Baker, 2000; Gosert et al., 2002; Harcourt et al., 2004); thus, the proper but 


not necessarily the highest protease activity is beneficial for virus survival. 


In order to investigate the mechanism of the DUB and delSGylating activities 


of CoV PL?'’s, the complex of the enzyme with ubiquitin (or ISG15) is 


ot 


important. Until now, structures of SARS-CoV and MERS-CoV PL2°° with 
mono-Ub as well as of SARS-CoV PL2°° with di-Ub have been obtained 
(Chou et al., 2014; Ratia et al., 2014; Békés et al., 2016; Bailey-Elkin et al., 
2014; Lei & Hilgenfeld, 2016). Very recently, the structure of SARS-CoV PL2?° 
in complex with the C-terminal Ub! domain of hlSG15 or mlSG15 has also 
been reported (Daczkowski et al., 2017). These structures show that the 
PL2°"° of SARS-CoV possesses two ubiquitin-binding sites (named Ub1 and 
Ub2 sites here; Ratia et al., 2014; Bekes et al., 2016). From the prior structure 
of USP14 in complex with ubiquitin, it is Known that two blocking loops (BL1 
and BL2) regulate substrate binding (Hu et al., 2005). Different from that, only 
the BL2 exists in CoV PL2°'°s and is involved in substrate binding (Fig. 6; Chou 
et al., 2014; Ratia et al., 2014; Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016), 


whereas BL1 is absent in CoV PL2°'°s (Ratia et al., 2006; Lei et al., 2014). 


The proximal Ub binding site (Ub1) is, to a certain degree, conserved between 
the PL2°'s of SARS-CoV and MERS-CoV. The region includes the narrow 
substrate channel between the thumb and the palm subdomains, as well as a 
hydrophobic patch in the fingers subdomain (Fig. 6). The narrow substrate 
channel binds the C-terminal RLRGG residues of ubiquitin (Chou et al., 2014; 
Ratia et al., 2014; Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016; in order to 


be clear, Ub residues appear in italics here). The C-terminal RLRGG of 
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ubiquitin is similar to the unprimed side of the polyprotein substrates, 
(R/K)(L/)XGG in the two viruses. The $1, S2, and S4 pockets are well 
conserved to accommodate the two small glycines (P1, P2) and the 
hydrophobic P4 residue (Leu or lle). In contrast, the flexible side-chains in P3 
and P5 feature binding patterns that are slightly different between SARS-CoV 
and MERS-CoV PL2°"°. In the SARS-CoV PL2°(Cys112Ser)—Ub complex, 
P3-Arg forms a weak salt-bridge with Glu162 (Chou et al., 2014), whereas the 
corresponding P3-Arg is exposed to solvent in the MERS-CoV complex (Lei & 
Hilgenfeld, 2016). On the other hand, the P5-Arg is exposed to solvent in the 
SARS-CoV complex (Chou et al., 2014) but forms a strong salt-bridge with 
Asp164 in MERS-CoV (Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016). 
Interestingly, this Asp164 is unique among CoV PL2°'°s, and the Asp164Ala 
replacement leads to an about 4.5-fold and 3.5-fold reduction of the proteolytic 
and DUB activities, respectively (Lei & Hilgenfeld, 2016). As just mentioned, 
the proteolytic activity of the MERS-CoV PL2°° is not optimized due to the 
deficient oxyanion hole. On the other hand, the virus requires a strong DUB 
activity to counteract the host immune response. The suboptimal enzyme 
activities may be partly compensated by the unique Asp164 (Lei & Hilgenfeld, 


2016). 
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In addition to the binding of the Ub C-terminus to the substrate channel, there 
Is an interaction between a hydrophobic region of the SARS-CoV and 
MERS-CoV PL2?'’s in the fingers subdomain and a hydrophobic patch (/le44, 
Ala46, Gly47) of Ub (Chou et al., 2014; Ratia et al., 2014; Bailey-Elkin et al., 
2014; Lei & Hilgenfeld, 2016; Bekes et al., 2016). This hydrophobic patch of 
Ub is commonly used to interact with Ub-binding proteins (Dikic et al., 2009). 
The fingers subdomain residues involved are Tyr208 and Met209 in 
SARS-CoV, and Tyr209 and Val210 in MERS-CoV (Chou et al., 2014; Ratia et 
al., 2014; Bailey-Elkin et al., 2014; Lei & Hilgenfeld, 2016; Bekeés et al., 2016). 
Moreover, these hydrophobic interactions between the PL2°"° and Ub are 
important for the DUB activity of the enzyme, because disrupting them via a 
Val210Arg mutation dramatically diminishes the DUB activity in MERS-CoV 


PL2? (Bailey-Elkin et al., 2014). 


Near the hydrophobic patch of Ub, Arg42 forms a salt-bridge with Glu168 of 
PL2°° in two. structures of the SARS-CoV PL2°° in complex with 
mono-ubiquitin or Lys48-linked di-Ub (Chou et al., 2014; Ratia et al., 2014; 
Békés et al., 2016). However, this Glu is replaced by Arg in MERS-CoV PL2°", 
resulting in Arg42 instead forming a salt-bridge with Asp165 in the MERS-CoV 
PL2°"°_ubiquitin complex (Lei & Hilgenfeld, 2016). This illustrates that various 


fine-tuned binding patterns exist between Ub and PL2°'°s in different CoVs. 


AQ 


Besides the Ub1 binding site, the Ub2 binding site is mapped by the complex 
of SARS-CoV PL2°"° with Lys48-linked di-Ub (Fig. 6; Békés et al., 2016). The 
Ub2 binding site is located at the first a helix of the thumb subdomain. Phe7/0 
interacts with the common hydrophobic patch (/le44, Ala46, Gly47) of Ub. 
Interestingly, MERS-CoV PL2°"° seems to lack the corresponding Ub2 binding 
site. Phe70 of SARS-CoV PL2"° is changed to Lys69 in MERS-CoV (Békés et 
al., 2016). In addition, Bekes et al. (2016) predicted that Tro107 and Ala108 
could constitute the Ub1' binding site in SARS-CoV PL2°°. The 
Tro10/7Leu/Ala108Ser double mutation reduces the enzyme's activity towards 
Lys48-linked tri-Ub-AMC by about 75% (Bekes et al., 2016). However, it 
should be noted that Tro107 contributes to the oxyanion hole of SARS-CoV 
PL2°"° (see above); therefore, the reduced DUB activity upon replacing Trp107 
by Leu is perhaps not due to altering the Ub1' binding site, but rather to 


destroying the oxyanion hole. 


The SARS-CoV PL2°° displays more efficient cleavage activity towards 
Lys48-linked di-Ub-AMC than Lys63-linked di-Ub-AMC substrates in vitro, 
demonstrating that the PL2°"° preferentially recognizes Lys48-linked polyUb 
chains (Baez-Santos et al., 2014; Bekes et al., 2015, 2016). In contrast, 


MERS-CoV PL2°° processes Lys48- and Lys63-linked polyUb chains with 
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similar efficiency (Baez-Santos et al., 2014). Lys48-linked Ub chains mainly 
cause target protein degradation via the 26S proteasome, while Lys63-linked 
polyUb is mainly related to DNA repair and signal transduction (Ikeda & Dikic, 
2008), in particular, in the signal transduction cascades of the host innate 
immune system (Dikic & Dotsch, 2009). However, the biological significance of 
the CoV PL*'°s showing different cleavage activities on Lys48- and 
Lys63-linked polyUb is still unclear. Furthermore, the SARS-CoV PL2?° 
cleaves the polyUb chain by removing di-Ubs, not mono-Ub units as in 
MERS-CoV (Bekés et al., 2015). This strongly suggests that MERS-CoV 
PL2°"° possesses the Ub1 and Ub1' binding sites but not a Ub2 site, consistent 


with the Phe70 to Lys mutation in MERS-CoV PL2°"° as just mentioned. 


At the same time, ISG15 utilizes a different Ub2 binding site of SARS-CoV 
PL2°, compared to Lys48-linked di-Ub (Békés et al., 2016), but no structure 
for a full-length ISG15—CoV PL2°"° complex is available so far. Daczkowski et 
al. (2017) reported that the C-terminal domains of |SG15s (similar to Ub1 
mentioned above) from different species have different binding characteristics 
with SARS-CoV PL2°"° according to two structures, the PL2°"° in complex with 
the C-terminal domain of hlISG15 and mlSG15, respectively. In addition, the 
structure of mouse USP18 in complex with full-length mlSG15 became 


available this year (Basters et al., 2017). Surprisingly, the N-terminal Ubl 
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domain of mlSG15 shows almost no interaction with mUSP18. Does ISG15 
behave similarly when binding to the CoV PL’? How does the N-terminal 
domain of ISG15 of different species recognize the cognate CoV PL?'°? It 
would be of interest to determine not only the structure of a full-length 


hISG15—-HCoV PL?° complex but also that of mlISG15 with MHV PL®”. 


The DUB and delSGylating activities of CoV PL’’s are well established, but 
the detailed mechanism of the PL°® antagonism of the host innate immune 
response is still ambiguous (see Lei & Hilgenfeld, 2017, for a recent review). 
Various cytokines (including interferons (IFNs) and tumor necrosis factors 
(TNFs)) are produced to inhibit virus replication by two main pathways, the 
IRF3 pathway and the NF-kB pathway (Seth et al., 2006; Hiscott et al., 2006). 
For more information on the host innate immune system signaling pathways, 
the reader should consult other reviews (e.g., Mogensen, 2009; Lei & 
Hilgenfeld, 2017). Devaraj et al. (2007) found that the SARS-CoV PL2°° can 
directly bind IRF3 to block its phosphorylation, dimerization, and nuclear 
translocation, thereby inhibiting IFN-B induction. Furthermore, the PL2°° was 
found not to block the NF-kB signaling pathway and the protease activity was 
described as dispensable for antagonizing the IFN response (Devaraj et al., 
2007). Clementz et al. (2010) also confirmed that the enzyme activity of 


HCoV-NL63 PL2°"° is not essential for counteracting the antiviral IFN 
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production. In contrast, Frieman et al. (2009) reported that the SARS-CoV 
PL2° does not directly bind IRF3 or disrupt its phosphorylation. Instead, the 
PL2°"° was proposed to inhibit the NF-KB signaling pathway by stabilizing its 
inhibitor, IkKBa (Frieman et al., 2009). Furthermore, the protease activity of 
SARS-CoV PL2"° is important for blocking the TNF-a/NF-kB_ signaling 
pathway (Frieman et al., 2009). In addition, the HCoV-NL63 but not the MHV 
PL2°"° has the ability to impede the IRF3 and NF-kB pathways, indicating that 
the functions of the PL2°"° are specific for different CoVs (Frieman et al., 2009). 
Later, a protein comprising the SARS-CoV PL2”” and the TM (transmembrane 
region of Nsp3) was demonstrated to inhibit the STING/TBK1/IKKe-mediated 
Signaling pathway (upstream regulators of IRF3; Chen et al., 2014), thereby 
disrupting IRF3 phosphorylation and dimerization, and blocking the type-I IFN 
response. SARS-CoV PL2°° plus TM can also physically interact with the 
STING-TRAF3-TBK1 complex and remove the ubiquitins from ubiquitinated 
RIG-I, STING, TRAF3, TBK1, as well as IRF3 (Chen et al., 2014). In 2016, it 
was reported that the SARS-CoV PL2°"° can inhibit the Toll-like receptor 7 
(TLR7)—mediated type-I IFN response and the NF-KB pathway by removing 
the Lys63-linked polyUb chain from TRAF3 and TRAF6 (upstream regulators 
of IRF3 and NF-kB; Li et al., 2016b). Interestingly, the SARS-CoV PL2°" only 
removes the Lys63- but not the Lys48-linked polyUb chain from TRAFS3 and 


TRAFE6 in vivo (Li et al., 2016b). On the other hand, Baez-Santos et al. (2014) 
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and Békés et al. (2015, 2016) have shown that SARS-CoV PL2°" prefers to 
digest Lys48- over Lys63-linked polyUb chains in vitro (see above). Why does 
the substrate specificity of PL2°° seem to be different in vivo and in vitro? 
Does any other factor influence the substrate specificity of PL2°° in vivo when 
counteracting the cellular innate immune response? These questions are yet 


to be answered. 


In addition, the HCoV-NL63 PL2"° was shown to block the p53-IRF7-IFNB 
signaling pathway (Yuan et al., 2015). p53 can induce type-I interferon 
production via IRF/ (interferon regulatory factor 7; Yuan et al., 2015). 
Meanwhile, p53 can be degraded via the MDM2- (an ES ubiquitin ligase) 
mediated ubiquitin-proteasome system (Haupt et al., 1997). Yuan et al. (2015) 
found that the HCoV-NL63 PL2°"° deubiquitinates and stabilizes MDM2 to 
augment p53 degradation, thereby antagonizing the host innate immune 
response. Recently, the PL2°"° of SARS-CoV and MERS-CoV as well as the 
PL1?'°/PL2"° of HCoV-NL63 were shown to directly interact with the host E3 
ubiquitin ligase RCHY1 (also called Pirh2; Ma-Lauer et al., 2016), thereby 
increasing the stability of the latter. Like MDM2, RCHY1 can induce p53 
degradation as well (Leng et al., 2003). Ma-Lauer et al. (2016) found that p53 
inhibits the replication of SARS-CoV. Stabilization of RCHY1 by physical 


interaction with the PL2°° increases the degradation of p53 and supports 
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coronavirus replication (Ma-Lauer et al., 2016). While the HCoV-NL63 PL2?° 
stabilizes MDM2 by debiquitinating it (Yuan et al., 2015), the SARS-CoV 
PL2° surprisingly does not deubiquitinate RCHY1 (Ma-Lauer et al., 2016). 
How does the PL2°° stabilize RCHY1? The mechanism has yet to be 


elucidated. 


Besides the functions of PL2°° discussed above, the enzyme was shown to 
interact with other viral proteins. The region from PL2° to the C-terminus of 
Nsp3 in SARS-CoV can interact with the Nsp2, ORF38a, and ORF9Yb proteins, 
as identified by Y2H and ColP assays (von Brunn et al., 2007). Through similar 
assays, the region PL2°"°-NAB-BSM was found to interact with Nsp4 as well 
as Nsp12 (Pan et al., 2008). The SARS-CoV PL2°° was further shown to bind 


ORF 7a and Nsp6 by using proteomics analysis (Neuman et al., 2008). 


Coronavirus PL’® is an important target for developing antiviral drugs. This 
aspect has been well reviewed by Baez-Santos et al. (2015) within this series; 
hence, we mention only inhibitors here that have been described since. Two 
big challenges exist when designing PL®” inhibitors: 1) the S1 and S2 binding 
pockets are tailor-made to accommodate glycine residues and hence they are 
small; therefore, identifying suitable peptidomimetic chemical structures is 


difficult; 2) many host USPs feature folds and active sites similar to the PL®’’s, 


46 


SO specificity of the inhibitors could be an issue. However, there is a good 
chance that the BL2 loop (mentioned above) of CoV PL2°"°s could provide 
sufficient uniqueness to solve the specificity problem. This loop is involved in 
substrate binding and is different not only between USPs and CoV PL?’s but 
also among different CoVs (Hu et al., 2005; Ratia et al., 2006; Lei et al., 2014; 
Baez-Santos et al., 2014, 2015; Lee et al., 2015). For example, this loop 
comprises 6 amino-acid residues (GNYQCG) in SARS-CoV PL2°° but 7 
(GIETAVG) in the enzyme of MERS-CoV, leading to the inability of SARS-CoV 
inhibitors to act on MERS-CoV PL®” (Baez-Santos et al., 2014; Hilgenfeld, 
2014; Lee et al., 2015). Using a high-throughput assay, the purine derivative 
8-(trifluoromethyl)-9H-purin-6-amine (compound 4; Fig. 7A) was identified as a 
competitive MERS-CoV PL2°" inhibitor, with an ICs9 of about 6 uM in vitro (Lee 
et al., 2015). Interestingly, this compound is also (moderately) active against 
SARS-CoV PL2?"° (ICs9 11 UM) but acts as an allosteric inhibitor in this case 
(Lee et al., 2015). Furthermore, the authors also reported that this inhibitor 
shows very high selectivity against human ubiquitin C-terminal hydrolase 
(NUCH-L1; ICs5q > 100 UM), which is one of the host proteins most closely 
related to the CoV PL”® (Lee et al., 2015). In contrast, Clasman et al. (2017) 
reported that compound 4 features no selective inhibition of CoV PL’’s nor 
host USPs; therefore, this compound could be a pan-assay interference 


inhibitor (or PAIN). Recently, nine alkylated chalcones (1-9) and four 
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coumarins (10-13), which were isolated from the perennial plant Angelica 
keiskei, had their inhibitory activities against both the SARS-CoV MP (3CLP", 
chymotrypsin-like protease) and the PL2° tested (Park et al., 2016). One of 
the chalcones, compound 6 (Fig. 7B), exhibited relatively strong inhibition of 
both the 3CL°° and the PL2°"° jn vitro, with ICs) values of 11.4 and 1.2 uM 
respectively (Park et al., 2016). Chalcone 6 uses different inhibition 
mechanisms for 3CL"° and PL2°"°. It is a competitive inhibitor for the former 
enzyme but a non-competitive one for the latter (Park et al., 2016). Clearly, the 
large body of structural information available for the CoV PL’'’s and host DUBs 


should enable more design of inhibitors specific for the viral enzyme. 


6. Nucleic Acid-Binding (NAB) domain and betacoronavirus-specific 
marker (8BSM) domain 

The nucleic-acid binding (NAB) and betacoronavirus-specific marker (BSM) 
domains together are also named “Nsp3e” (Neuman et al., 2008). The latter 
domain alone was previously called “group 2-specific marker” (G2M) (Neuman 
et al., 2008). The NAB and BSM domain exist in the genus Betacoronavirus. 
The corresponding region is absent in  alphacoronaviruses and 
deltacoronaviruses (Neuman, 2016). In gammacoronaviruses, there is a 
gammacoronavirus-specific marker (yYSM) domain at this position (Neuman, 


2016). 
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Structural information on this region ts very limited for all coronaviruses. Thus 
far, only an NMR structure of the NAB domain of SARS-CoV is available 
(Table 1; Fig. 8; Serrano et al., 2009). The structure comprises two antiparallel 
B sheets (81+86; B2+88) and one parallel 8 sheet (G3—B4—B5-B7) as well as 
two a helices and two 3i9 helices (ni and n2) in the order 
B1—B2-B3-—a1—B4—B5-—n1—-n2-B6-B7—a2-B8. Four B strands (G3-B4—65-B7) 
and two helices (a1, a2) form a “half-barrel”. The structure of the NAB 
represents a unique fold (Serrano et al., 2009). The domain has been shown to 
bind ssRNA as well as to unwind dsDNA (Neuman et al., 2008). When binding 
to ssRNA, the NAB prefers sequences with repeats of three consecutive Gs 
(Serrano et al., 2009), such as (GGGA)s5 and (GGGA)pz. A positively charged 
Surface patch (Lys/5, Lys76, Lys99, and Arg106) is involved in RNA binding 
(Fig. 8). These residues are located in the loop between n2 and £6 as well as 
In helix a2 (Serrano et al., 2009). The RNA binding behavior of the NAB 
appears to be similar to that of SARS-CoV Mac3 (SUD-M), which has a 
specificity for oligo(G) (Tan et al., 2007, 2009), although the latter is also 
reported to bind oligo(A) (Chatterjee et al., 2009; Johnson et al., 2010, 
mentioned above). Whether there is a functional relation between Mac3 and 


NAB, remains to be investigated. 
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Currently, no structural information is available concerning the BSM or ySM, 
and nothing is known about the function of these modules either. A gene 
encoding the BSM domain of SARS-CoV could not be expressed in E. coll; this 
module has been predicted to be a nonenzymatic domain (Neuman et al., 
2008). In the absence of sequence similarity to any domain of known function, 
we performed an ab-initio protein structure prediction using the sequence of 
the SARS-CoV BSM domain and the QUARK online server (Xu & Zhang, 
2012). The result indicates that most of this region is intrinsically disordered. 
This does not preclude that it might adopt a defined structure upon interaction 


with another Nsp or RNA, or a host protein. 


7. Transmembrane regions (TM1 and TM2), Nsp3 ectodomain, Y1 domain, 
and CoV-Y domain 

This part of Nsp3 includes two transmembrane regions as well as three soluble 
domains, which together constitute about one third of the multidomain protein. 
The two transmembrane regions are TM1 and TM2, while the three domains 
are the Nsp3 ectodomain (3Ecto), Y1, and CoV-Y. The sequential order of this 
part is TM1i—3Ecto—TM2-Y1-—CoV-Y (Fig. 1A and B). Even though this part 
exists in all coronaviruses (Neuman et al., 2008; Neuman, 2016), thus far, no 


three-dimensional structure is available for the entire region nor for a part of it. 
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Nsp3 of CoVs is thought to pass the ER membrane twice, since there are two 
predicted transmembrane regions, TM1 and TM2 (Harcourt et al., 2004; 
Kanjanahaluethai et al., 2007; Oostra et al., 2008). According to the 
transmembrane region prediction server TMHMM (Krogh et al., 2001), there is 
a total of three hydrophobic regions in SARS-CoV Nsp3 (Table 1; Fig. 1B). 
Oostra et al. (2008) proposed that the first two of the three hydrophobic 
regions span the membrane while the last one (AH1), which has more 
amphipathic character, does not (Fig. 1B). Thus, the 3Ecto would be the only 
domain located on the lumenal side of the ER in SARS-CoV Nsp3 (Fig. 1B). 
The 3Ecto is thought to bind metal ions and has also been designated as a 
zinc-finger (ZF) domain before (Neuman et al., 2008). Neuman, (2016) found 
that the metal binding Cys-His cluster is not conserved in all CoVs and has 
renamed this domain into “SEcto”. In fact, only two cysteine residues are 
conserved in the CoV 3Ecto domain (Fig. 9A), hence this domain is unlikely to 
be a zinc-finger domain. The transmembrane regions plus the 3Ecto are 
important for the PL2°° to process the Nsp3/4 cleavage site in SARS-CoV 
and MHV (Harcourt et al., 2004; Kanjanahaluethai et al., 2007); a possible 
reason is that the transmembrane part could bring the PL2°° close to the 
cleavage site between the membrane-associated proteins Nsp3 and Nsp4. 
Asparagine (N)-linked glycosylation has been found in the 3Ecto domains of 


SARS-CoV and MHV (Fig. 1B and 9; Harcourt et al., 2004; Kanjanahaluethai 
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et al., 2007). It is unclear if the N-glycan modification affects the 3Ecto 
conformation or stability. Frequently, N-linked glycans serve as recognition 
points for partner molecules (Aebi, 2013). It has been shown that interaction of 
the 3Ecto with the lumenal loop of Nsp4 is essential for the ER rearrangements 
occurring in cells infected by SARS-CoV or MHV (the 3Ecto is named “lumenal 


loop of Nsp3" in this paper; Hagemeijer et al., 2014). 


The Y1 and CoV-Y domains are located at the cytosolic side of the ER. The Y1 
domain is conserved in all viruses of the order Nidovirales, while CoV-Y is only 
conserved in all coronaviruses (Neuman, 2016). Since no three-dimensional 
structure is available for this part, the domain assignment of Y1 and CoV-Y is 
ambiguous (Neuman, 2016). We found that the sequence identity of 
Y1+CoV-Y between different CoV genera is above 25% and two Cys-His 
clusters are present in the N-terminal part of the Y1 domain, possibly binding 
zinc ions (Fig. 9). However, it is still unclear if the fold and function in this 
region are conserved. Currently, functional information on this part is limited. It 
has been shown that the C-terminal third of Nsp3 (8SM _ (partial) 
—TM1-3Ecto—TM2—-AH1—Y1+CoV-Y) of Nsp3 binds less efficiently to Nsp4 
without the Y1 and CoV-Y domains (Hagemeijer et al., 2014), although these 


two domains are not as important for this process as the 3Ecto. 
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According to a Y2H screen, CoIP, as well as GST pull-down assays, different 
constructs of Nsp3 with different C-terminal regions were identified to interact 
with various viral non-structural proteins of SARS-CoV (von Brunn et al., 2007; 
Imbert et al., 2008; Pan et al., 2008). For example, a construct comprising the 
domains from PL2°"° to the end of Nsp3 can bind Nsp2, ORF3a, and ORF9b 
(see above; von Brunn et al., 2007); the NAB—BSM-—TM1 of Nsps can interact 
with Nsp5, Nsp7 — 8, as well as Nsps 12 — 16, and Y1 plus CoV-Y interacts 
with Nsp9 and Nsp12 (Imbert et al., 2008); in addition, the NAB—BSM-—TM1 of 
Nsp3 can also interact with other domains within Nsp3, except for Maci (X 
domain) (Imbert et al., 2008); a PL2°°°-NAB-BSM-TM1 construct of Nsp3 can 
bind Nsp4 and Nsp12, while the region from TM1 to the end of Nsp3 only binds 
Nsp8g (Pan et al., 2008). It has been found that the interaction between the 
C-terminal region of Nsp3 and Nsp4 is essential for the formation of CMs and 
DMVs derived from the ER in CoV-infected cells (Angelini et al., 2013; 
Hagemeijer et al., 2014). The viral RNA and replicase proteins (Nsps) need to 
be associated with these modified membranes to form the replicative 
organelles (see Neuman, 2016, for review). In addition, these membranes can 
protect the viral RNA and Nsps against nucleases and proteases in vitro (van 
Hemert et al., 2008). Besides the Nsp3—Nsp4 interaction, it is still unclear 
whether all other interactions really exist or how these interactions affect the 


viral life cycle. At least, it seems that the membrane-associated region of Nsp3 
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may regulate the interactions with other viral proteins. It is definitely necessary 
to put more effort into the structural and functional characterization of this 


region. 


Conclusions 

Overall, the multi-domain Nsp3 plays various roles in coronavirus infection. It 
releases Nsp1, Nsp2, and itself from the polyproteins and interacts with other 
viral Nsps as well as RNA to form the replication/transcription complex. It acts 
on postitranslational modifications of host proteins to antagonize the host 
innate immune response (by de-MARylation, de-PARylation (possibly), 
deubiquitination, or delSGylation). Meanwhile, Nsp3 itself is modified in host 
cells, namely by N-glycosylation of the 3Ecto domain. Furthermore, Nsp3 can 


interact with host proteins (such as RCHY1) to support virus survival. 


As the largest non-structural protein of CoVs, Nsp3 has also been identified as 
the major selective target for driving evolution in lineage C betaCoVs on the 
basis of a high rate of positively selected mutation sites (Forni et al., 2016). 
Furthermore, the adaptive evolution of Nsp3 of MERS-CoV is still ongoing 
(Forni et al., 2016). For example, the Arg911Cys mutation (located in the palm 
subdomain of the PL2°'"°, corresponding to Arg283 in Lei et al., 2014) of Nsp3 


exists in the viral strain KOR/KNIH responsible for the 2015 South Korean 
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outbreak but not in the ancestral strain EMC/2012 (Forni et al., 2016). It is 
interesting to speculate why coronaviruses keep many essential functions in 
one protein, while this protein shows high-rate genetic diversity during CoV 
evolution. In the end, increased research efforts into the structure and function 


of Nsp3 are needed to achieve a more complete understanding of this protein. 
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Legends to figures 

Fig. 1. Genome organization of coronaviruses; Nsp3 domains and their 
functions. (A) The 5'-terminal two thirds of the CoV genome comprise ORF1a 
and ORF1b. ORF1a encodes the polyprotein 1a (Nsp1-11) while ORF 1a plus 
ORF 1b produce the polyprotein 1ab (Nsp1-16) through a ribosomal frameshift 
overreading the stop codon of ORF 1a (indicated by a black arrow). The 
3'-proximal third encodes the structural proteins S, E, M, and N as well as 
accessory proteins. The polyproteins pola and pplab are processed by the 
viral proteases PL1°°, PL2°"° (both domains of Nsp3), and MP’ (8CLP'®, Nsp5). 
The domain organization of Nsp3 is different in different CoV genera. The 
Nsp3 of HCoV NL63 as a representative of alpha-CoVs, and of SARS-CoV in 
clade B of the genus beta-CoV, are zoomed out. The question mark within 
HCoV-NL63 Nsp3 indicates a region of unknown function and structure. (B) 
Summary of the functions and domain organization of SARS-CoV Nsp3. Nsp3 
is bound to double-membrane vesicles recruited from the endoplasmic 
reticulum (ER) membrane. The protein passes through this membrane twice, 
via the two transmembrane regions TM1 and TM2. AH1 its possibly an 
amphipathic helix attached to the ER membrane, next to TM2. Except for the 


3Ecto domain, all other Nsp3 domains are located in the cytosol. All domains 
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with known three-dimensional structures are indicated in light green (X-ray 
structures) or orange (NMR structures), whereas parts with unknown structure 
are in red. The best characterized functions of each domain of Nsp3 are shown. 
“=: glycosylation sites in the 3Ecto domain (Asn1431 and Asn1434; Harcourt 


et al., 2004). 


Fig. 2. Structures (in cartoon view) of the ubiquitin-like domain 1 (Ubl1) and 
Ubl2 in SARS-CoV, Ubl1 in MHV, as well as their structural homologues. (A) 
Ubl1 (residues 20-108) of SARS-CoV (PDB entry: 2IDY; Serrano et al., 2007). 
(B) Ubl1 (19-114) of MHV (PDB entry: 2MOA; Keane and Giedroc, 2013). (C) 
Ubl2 (residues 1-60) of SARS-CoV (PDB entry: 2FE8; Ratia et al., 2006). (D) 
human ubiquitin (PDB entry: 1UBQ; Vijay-Kumar et al., 1987). (E) human 
interferon-stimulated gene 15 (hlISG15; PDB entry: 1Z2M; Narasimhan et al., 
2005). hlSG15 contains two linked ubiquitin-like domains; here, the N-terminal 
UbI domain is shown. (F) the Ras-interacting domain of Ral@DS (PDB entry: 
1LFD; Huang et al., 1998). The N and C termini of all structures are marked. 
All a and 319 (n) helices are labeled and shown in cyan. B strands are in purple 
and loops are in brown. This figure and Fig. 3, 5, as well as 8 were generated 


by using Chimera (Pettersen et al., 2004). 


69 


Fig. 3. Crystal structure of the papain-like protease domain 1 (PL1°°) of TGEV. 
Cartoon view of the overall structure (PDB entry: 3MP2; Wojdyla et al., 2010). 
The thumb, fingers, and palm subdomains are shown in blue, brown, and 
green, respectively. The Ca atoms of the catalytic triad residues 
(Cys32—-His183—-Asp196) are displayed as yellow, blue, and red spheres. 
Residue Gln27 contributing to the oxyanion hole is shown in ball & stick style. 
lle155, Thr209, and Tyr175 forming the S4 pocket are labeled; Ile155 is in 
black and the latter two are in red. The N and C termini of the PL1°° are 


indicated. 


Fig. 4. Structure of the MERS-CoV macrodomain | (Mac1, X domain) in 
complex with ADP-ribose (ADPr) (PDB entry: 5HOL). The protein features an 
a/B/a sandwich fold. The central 6 sheet with the strand order 
B1—B2-67—B6-B3-B5-B4 is shown in purple, B1 and B4 are labeled. An F,-F, 
omit difference map of ADPr is shown in black (contoured at 4.0 o). The ADPr 
itself is displayed as brown sticks. The five regions (blue) relating to ADPr 
binding are marked by Roman numbers | — V. Fixing the two ends of the ADPYr, 
Asp21 and Asn39 are displayed by thicker red sticks. The O2' of ADPr forms a 
hydrogen bond with a water molecule (H2z0 308; green sphere) being 
stabilized by the side-chain of Asn155. The “GGG?” triple-glycine motif is 


displayed in black. HzO 310 (green sphere) corresponds to a water molecule 
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that has been proposed to mediate a nucleophilic attack onto the C1" atom of 
the ADPr in the de-MARylation reaction catalyzed by the VEEV X domain (Li et 
al., 2016a). The N and C termini of the X domain are marked. This figure and 


Fig. 6 were prepared using Pymol (Schrodinger; http://www.pymol.org/). 


Fig. 5. Structures (in cartoon style) of the macrodomains II (Mac2) and Ill 
(Mac3), of the Domain Preceding Ubl2 and PL2°° (DPUP) of SARS-CoV and 
MHV, as well as of the frataxin-like fold protein Yfhi. (A) and (B) Mac2 and 
Mac3 (PDB entry: 2W2G; Tan et al., 2009). Both domains possess the a/f/a 
sandwich fold. The central six 6 strands in the order B1—B6—65—-B2-B4-63 are 
displayed in purple. A predominantly positively charged surface patch 
(Lys563+Lys565+Lys568+Glu571; Nsp3 numbering) of Mac3 being involved in 
binding oligo(G) (Kusov et al., 2015) is labeled. (C) The SARS-CoV DPUP 
NMR structure (PDB entry: 2KQW; Johnson et al., 2010). (D) The MHV DPUP 
X-ray crystal structure (PDB entry: 4YPT; Chen et al., 2015). (E) Structure of 
the yeast frataxin-like protein Yfh1, as determined by NMR spectroscopy (PDB 
entry: 2GA5; He et al., 2004). All structures shown in (C), (D), and (E) display 
the typical frataxin-like fold. Two a helices located at the N- and C- terminal of 
each structure form one plane and the B sheet forms the other plane. The 


negatively charged residues (Asp or Glu) in the first a helix (a1) are shown in 
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red (in (C), (D), and (E)); they are possibly involved in binding metal ions. The 


N and C termini of all structures are marked. 


Fig. 6. Structure of the SARS-CoV papain-like protease 2 (PL2°'°) in complex 
with Lys48-linked diubiquitin (PDB entry: 5E6J; Bekes et al., 2016). The Ubl2 
is shown as a grey cartoon. The catalytic domain (PL2°°) is displayed in 
surface view. The thumb, fingers, and palm subdomains are shown in blue, 
light brown, and green, respectively. The blocking loop 2 (BL2) is depicted in 
red. The Lys48-linked diubiquitin is displayed as a light-blue cartoon. Lys48 of 
Ub1 is linked to the C-terminal G/y75 of Ub2 (black sticks) via a triazole (red 
Sticks). The N and C termini of Ub1 (N7, C7) as well as the N terminus of Ub2 
(N2) are marked. The conserved hydrophobic patches (/le44, Ala46, Gly47) of 
Ub1 and Ub2 are indicated by purple and orange dots, respectively. The 
residue Phe70O (yellow) interacting with the hydrophobic patch of Ub2 is 
labeled. The C-terminal Arg—Leu—Arg—Gly—Gly residues (RLRGG) of Ub1 are 


shown in ball & stick style (purple). P3-Arg and P5-Arg are marked. 


Fig. 7. Recently described inhibitors of the CoV PL2°. (A) Structural formula 
of the purine derivative 8-(trifluoromethyl)-9H-purin-6-amine (compound 4). 
This compound is a competitive MERS-CoV PL2°" inhibitor (Lee et al., 2015). 


It is also active against SARS-CoV PL2°"° but acts as an allosteric inhibitor in 


12 


this case. (B) A natural-product chalcone, compound 6 from the perennial plant 
Angelica keiskei, inhibits the SARS-CoV MP? (3CL"°) and PL2”" in vitro (Park 


et al., 2016). 


Fig. 8. NMR structure of the nucleic acid-binding (NAB) domain in SARS-CoV. 
(cartoon style; PDB entry: 2K87; Serrano et al., 2009). The order of 
secondary-structure elements is 81—B2—63-—a1—B4—B5—n1—n2-B6-8 7—a2-B8. 
The overall structure of NAB represents a unique fold. The residues involved in 
RNA binding (Lys75, Lys76, Lys99, and Arg106) are displayed in blue. The N 


and C termini of the NAB domain are labeled. 


Fig. 9. Multiple sequence alignment of the 3Ecto and the N-terminal portion of 
the Y1+CoV-Y domains. The conserved cysteines in 3Ecto as well as 
cysteines and histidines in the N-terminal portion of Y1 are marked by triangles. 
Two glycosylation sites in the 3Ecto domain of SARS-CoV (Asn1431 and 
Asn1434; Harcourt et al., 2004) are indicated by asterisks. The corresponding 
sequence accession numbers are: SARS-CoV, Genbank: AY274119.3; 
MERS-CoV, Genbank: JX869059.2; MHV, Genbank: AY700211.1; HCoV 
NL63, Genbank: AY567487.2; IBV, Genbank: M95169.1. The figure was 


generated using the program ESPript (Gouet et al., 1999). 
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Table 1 


Structural information on CoV Nsp3 domains and regions. 


Domain/region 


Res. no. */MW” 


Ubl1 


Acidic domain 
(HVR) 
PL1 pro x1 
Mac1 (X domain) 


Mac2 (SUD-N) 
Mac3 (SUD-M) 


DPUP (SUD-C) 


UbI2—PL2°" 


PL2?°° 
UbI2—PL2°° 
UbI2—PL2°" 

NAB 
BSM (G2M) 
TM1 
3Ecto 
TM2 
AH1 
Y1 + CoVvV-Y 


1546-1922 / 41.9 .d. 


1-112 / 12.6 


113-183 / 8.3 


n. a. / 23.6 
184-365 / 19.5 


389-524 / 15.2 
525-652 / 14.0 


653-720 / 7.8 


723-1036 / 35.2 


n. a. / 28.6 


1066-1180 / 13.0 
1203-1318 / 12.5 
1391-14131 / 2.4 
1414-1495/ 9.0 
1496-1518'/ 2.7 
1523-1545! / 2.7 


Method 
NMR 


NMR 


NMR 
NMR 
X-ray 
NMR 


X-ray 


Coronavirus 


SARS-CoV 
MHV 


TGEV 
SARS-CoV 
SARS-CoV 

HCoV-229E, IBV 
HCoV-229E, IBV 
FCoV 
MERS-CoV 
SARS-CoV** 
SARS-CoV 
SARS-CoV** 
SARS-CovV** 
SARS-CoV 
MHV** 

HKU9 
SARS-CoV 
SARS-CoV+human Ub 
SARS-CoV+human Ub 
SARS-CoV + diUb 
SARS-CoV + hISG15*° 
SARS-CoV + mISG15*° 
MERS-CoV 
MERS-CoV 
MERS-CoV+human Ub 
MERS-CoV+human Ub 
MERS-CoV 
IBV 
MHV** 
SARS-CoV 
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Reference 
Serrano et al. (2007) 
Keane & Giedroc (2013) 


Wojdyla et al. (2010) 
Saikatendu et al. (2005) 
Egloff et al. (2006) 
Xu et al. (2009) 
Piotrowski et al. (2009) 
Wojdyla et al. (2009) 
Cho et al. (2016) 
Tan et al. (2009) 
Chatterjee et al. (2009) 
Tan et al. (2009) 
Johnson et al. (2010) 
Johnson et al. (2010) 
Chen et al. (2015) 
Hammond et al. (2017) 
Ratia et al. (2006) 
Chou et al. (2014) 
Ratia et al. (2014) 
Békés et al. (2016) 
Daczkowski et al. (2017) 
Daczkowski et al. (2017) 
Lei et al. (2014) 
Lee et al. (2015) 
Bailey-Elkin et al. (2014) 
Lei & Hilgenfeld (2016) 
Clasman et al. (2017) 
Kong et al. (2015) 
Chen et al. (2015) 
Serrano et al. (2009) 


#: Nsp3 of the SARS-CoV strain TOR2 (Genbank: AY274119.3); %: molecular mass (kD); n. d.: structure is not 
determined; *': absent in SARS-CoV; n. a.: does not apply (residue numbers are only given for SARS-CoV); **: 
Mac2—Mac3 structure; *°: Mac3-DPUP structure; **: DPUP—UbI2-PL2°”° structure; *°: UblI2-PL2"°-C terminal UbI 
domain of human ISG15 structure; *°: Ubl2-PL2°°—C terminal Ubl domain of mouse ISG15 structure; t: regions are 
predicted by TMHMM server v. 2.0 (Krogh et al., 2001). TM1 and TM2 are transmembrane regions while AH1 is not 
(Oostra et al., 2008). 
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Table 2. 
Cleavage sites of PL1” and PL2’ in CoVs and the P5-P2’ residues for each cleavage 
site. 


Nsp1|2 Nsp2|3 Nsp3|4 Reference 
RTGRG AI NKMGG GD PKSGS GF 
TGEV Putics et al., (2006) 
GHGAG SV TKLAG GK AKQGA GF 
HCoV NL63 Chen et al. (2007) 
KRGGG NV TKAAG GK AKQGA GD 
HCoV 229E Ziebuhr et al. (2007) 
PL’ > PL2” | PLIP < PL2 
aa KGYRG VK RFPCA GK SLKGG AV Bonilla et al. (1997); 
Fe ELNGG AV RLKGG AP SLKGG KI 
SARS-CoV Harcourt et al. (2004) 
KLIGG DV RLKGG AP KIVGG AP 
MERS-CoV Yang et al. (2014) 
IBV Lim et al. (2000) 


n. d., not determined; *': absence of PL1””; **: partial presence of PL1””; /: absence of the cleavage site. 
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S5'UTR |” ~~ ~ UTR 
yo Ne sa 
a ™~™~ ~ ~~ 


Mac2 | Mac3 UblI2.  PL2PrO NAB 


B ™) X-ray structure §§) NMRstructure §§§ Structure unknown 


— Nsp4 binding. 
Lumen diss % 
Cytoplasm 1391 9 1496 MAHI 


oO 


1066 \ 1203 Y1 & CovV-Y 


bit Mat - Mad: Maso PLA 
1 113 184 / 389 


527 653 \ 723 783 


de-MARylation; ssRNA binding; ssRNA binding: 1546 1922 
pe ber metal-ion binding? dsDNA unwinding 
r binding; 


— G-quadruplex binding; protease; DUB; 
ssRNA binding; RCHY 1 binding. deISGylation; 
nucleocapsid binding. RCHY1 binding. 


ACCEPTED MANUSCRIPT 


ONIX} 


s\ 


Lys48-linked di-Ub 


fingers 


SARS-CoV 
MERS-CoV 
MHV 


HCoV NL63 
IBV 


SARS-CoV 
MERS-CoV 
MHV 


HCoV NL63 
IBV 


SARS-CoV 
MERS-CovV 
MHV 
HCoV NL63 
IBV 


SARS-CoV 
MERS-CovVv 


MHV 
HCov NL63 
IBV 


3Ecto domain 


1420 1430, Je 1440 1450 
ONFGAPSYCNGWVRE Legh N NVEIMDE CES Cc 
AYLGISSACDGBASABMRA FDVPTEFCANR C 
AWEFKTTEGVSTIICDERQOV LGYKSoOr CNG C 
ae a SLLUCGDIZIV S Gaqk K PNEADIY CCN. C 
ee a ee Re SLCGIPIY KDpqQGK FDVLRYCAD. Cc 

A 
1490 
AEWV IK 


PETG 
VE LV 
PV Td 
FP NWN 


LS 
LV 
GN 
ileal 


Y1 & CovV-Y 
1560 1570 1580 1590 1600 
eTSS Te N{teaM ‘eR Gigdenk 
eK DT Ate ClaG el SigteR 
@eSKP Ge Gifqas fey 'T Gages T 
eNNA De N{leaM fer 'T CigteN 
eK DV Tie | GlaR ery Nigtev kK 


A A A 
1640 1922 
PIT| ©@ee0e@ 
P eo@e000 
Pp @eo@ee0e00 
A eo@e0e0e080 
H eo@ee0e000@ 


Nonstructural protein 3 (~200 kD) is a multifunctional protein comprising up to 16 
different domains and regions. 


Nsp3 binds to viral RNA, nucleocapsid protein, as well as other viral proteins, and 
participates in polyprotein processing. 


Through its de-ADP-ribosylating, de-ubiquitinating, and de-ISGylating activities, Nsp3 
counteracts host innate immunity. 


Structural data are available for the N-terminal two thirds of Nsp3, but domains in the 
remainder are poorly characterized. 


The papain-like protease of Nsp3 is an established target for new antivirals. 


